One class random forests

نویسندگان

  • Chesner Désir
  • Simon Bernard
  • Caroline Petitjean
  • Laurent Heutte
چکیده

One class classification is a binary classification task for which only one class of samples is available for learning. In some preliminary works, we have proposed One Class Random Forests (OCRF), a method based on a random forest algorithm and an original outlier generation procedure that makes use of classifier ensemble randomization principles. In this paper, we propose an extensive study of the behavior of OCRF, that includes experiments on various UCI public datasets and comparison to reference one class algorithms – namely, gaussian density models, Parzen estimators, gaussian mixture models and One Class SVMs – with statistical significance. Our aim is to show that the randomization principles embedded in a random forest algorithm make the outlier generation process more efficient, and allow in particular to break the curse of dimensionality. One Class Random Forests are shown to perform well in comparison to other methods, and in particular to maintain stable performance in higher dimension, while the other algorithms may fail.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One Class Splitting Criteria for Random Forests

Random Forests (RFs) are strong machine learning tools for classification and regression. However, they remain supervised algorithms, and no extension of RFs to the one-class setting has been proposed, except for techniques based on second-class sampling. This work fills this gap by proposing a natural methodology to extend standard splitting criteria to the one-class setting, structurally gene...

متن کامل

Random Forests for Big Data

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based o...

متن کامل

An Introduction to Random Forests for Multi-class Object Detection

Object detection in large-scale real-world scenes requires efficient multi-class detection approaches. Random forests have been shown to handle large training datasets and many classes for object detection efficiently. The most prominent example is the commercial application of random forests for gaming [37]. In this paper, we describe the general framework of random forests for multi-class obj...

متن کامل

Customer churn prediction using improved balanced random forests

Churn prediction is becoming a major focus of banks in China who wish to retain customers by satisfying their needs under resource constraints. In churn prediction, an important yet challenging problem is the imbalance in the data distribution. In this paper, we propose a novel learning method, called improved balanced random forests (IBRF), and demonstrate its application to churn prediction. ...

متن کامل

Fault Locating in High Voltage Transmission Lines Based on Harmonic Components of One-end Voltage Using Random Forests

In this paper, an approach is proposed for accurate locating of single phase faults in transmission lines using voltage signals measured at one-end. In this method, harmonic components of the voltage signals are extracted through Discrete Fourier Transform (DFT) and are normalized by a transformation. The proposed fault locator, which is designed based on Random Forests (RF) algorithm, is train...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 46  شماره 

صفحات  -

تاریخ انتشار 2013